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Abstract 

A Fortran- 77 program for goodness of fit tests for histograms with weighted 
entries as well as with unweighted entries is presented. The code calculates 
test statistics for case of histogram with normalized weights of events and in 
case of unnormalized weights of events. 
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Computer: Any Unix/Linux workstation or PC with a Fortran-77 compiler 
Classification: 4.13, 11.9, 16.4, 19.4 

External routines/libraries used: FPLSOR (M103) from CERN Program Library 
Nature of problem: The program calculates goodness of fit test statistics for 
weighted histograms 

Solution method: Calculation of test statistics is done according formulas presented 
inRef. fl^ 
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1. Introduction 

A histogram with m bins for a given probabihty density function p{x) is 
used to estimate the probabiUties 



that a random event belongs to bin i. Integration in ([T]) is done over the bin 



A histogram can be obtained as a result of a random experiment with 
probability density function p{x). Let us denote the number of random events 
belonging to the zth bin of the histogram as rij. The total number of events 
in the histogram is equal to n = The quantity pi = rii/n is an 

estimator of pi with expectation value Epj = Pi- 

The problem of goodness of fit is to test the hypothesis 

Hq:pi= pio, . . .,Pm-i = Pm~ifi vs. Ha'-Pi^ Pio for some i, (2) 

where p^ are specified probabilities, and X^I^iPio = 1- The test is used in a 
data analysis for comparison theoretical frequencies npio with the observed 
frequencies rii. The test statistic 



was suggested by Pearson [2]. Pearson showed that the statistic ([3]) has 
approximately a Xm-i distribution if the hypothesis Hq is true. 

To define a weighted histogram let us write the probability pi for a 
given probability density function p{x) in the form 




(1) 







where 



w{x) = p{x)/g{x) 



(5) 
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is the weight function and g{x) is some other probabihty density function. 
The function g{x) must be > for points x, where p{x) ^ 0. The weight 
w{x) = if p{x) = 0, see Ref. j^. Because of the condition X^jP? = 1 further 
we will call the above defined weights normalized weights as opposite to the 
unnormalized weights w{x) which are w{x) = const ■ w{x). 

The histogram with normalized weights was obtained from a random 
experiment with a probability density function g{x), and the weights of the 
events were calculated according to ([5]). Let us denote the total sum of the 
weights of the events in the ith bin of the histogram as 

rii 

W, = J2Mk) (6) 

k=l 

and the total sum of squares of weights as 

rii 

W^2. = X]«;.(A:)^ (7) 

k=l 

where rii is the number of events at bin i and Wi{k) is the weight of the kth 
event in the ith bin. The total number of events in the histogram is equal to 
n = where m is the number of bins. The quantity pi = Wi/n for the 

histogram with normalized weights is the estimator of Pi with the expectation 
value Epi = pi. Note that in the case where g{x) = p{x), the weights of the 
events are equal to 1 and the histogram with normalized weights is the usual 
histogram with unweighted entries. 

For weighted histograms again the problem of goodness of fit is to test 
the hypothesis 

Hq:pi= pio, . . . ,pm-i = Pm-1,0 VS. Ha : Pi ^ Pio for some i, (8) 

where pto are specified probabilities, and YI^iPm — ^■ 

The test statistic that is a generalization of Pearson's statistic ([3]) was 
proposed in [l| for cases of histograms with normalized weights of entries as 
well as with unnormalised weights of entries. A code for the calculation of 
test statistics is presented in this article. As shown in [l| if hypothesis Hq ([8]) 
is true then the statistic for a histogram with normalized weighted entries has 
approximately the Xm-i distribution and for a histogram with unnormalized 
weighted entries has Xm-2 distribution. 
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Use of the proposed test is inappropriate if any expected count in bin of 
histogram is below 1 or if the expected count is less than 5 in more than 20% 
of the bins. This empirical restriction known for the usual chi-square test j3] 
is quite reasonable for weighted histograms. 

Information for readers. Recently, another paper dedicated to weighted 
histograms has been published in "Computer Physics Communication", see 
Ref. The same author has presented a program for calculating test 

statistics to compare weighted histogram with unweighted histogram and 
two histograms with weighted entries. The test can be used for the compar- 
ison of experimental data distributions with simulated data distributions as 
well as for the two simulated data distributions. 

2. Computer program 

CHIWEI is subroutine which can be called from Fortran program for the 
calculation of test statistics. 

Usage 

CALL CHIWEI (P , Wl , W2 , N , NCHA , MODE , STAT , NDF , IFAIL) 

Input Data 

P - one dimensional real array of probabilities Pi 

Wl - one dimensional array, sum of weights Wi in each bin 

W2 - one dimensional array, sum of squares of weights in each bin 

N - number of events n 

NCHA - number of bins m 

MODE - must be equal to 1 for a histogram with normalized weights, and 
equal 2 for histogram with unnormalized weights 



4 



Output data 



STAT - test statistic following a chi-square distribution with NDF degrees 
of freedom if hypothesis Ho is true 

NDF - number of degree of freedom (will be m-MODE) 

IFAIL - will be > if calculation is not successful. 

3. Test run 

We take a distribution 

P^^^ {X - 10)^ + 1 + (X - uy + 1 

defined on the interval [4, 16] and representing two so-called Breit-Wigner 
peaks. Two cases of the probability density function g{x) are considered 

g-^(x)=p{x) (10) 

ixJy + i + (^^iWt ^''^ 

Distribution f lTOj) gives an unweighted histogram and the method coin- 
cides with Pearson's chi square test. Distribution f lTTj) has the same form of 
parametrization as (|9]), but with different values of the parameters. Three 
cases of histograms were considered: unweighted histogram, histogram with 
weights p{x)/g2{x) and histogram with unnormalized weights 2p{x)/g2{x). 
Histograms with 5 bins were created by simulation 1000 entries for each case. 
The results of the calculations are presented below. Program PROB(GIOO) 
[sl has been used for calculating p-values. 

Test 1 

INPUT 

P 0.0296 0.1106 0.4460 0.2067 0.2072 

Wl 26.0000 115.0000 454.0000 183.0000 222.0000 

W2 26.0000 115.0000 454.0000 183.0000 222.0000 
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N 1000 
NCHA 5 
MODE 1 



OUTPUT 

STAT 4.5291 (p-value=0.3391) 
NDF 4 
IFAIL 

Test 2 

INPUT 

P 0.0296 0.1106 0.4460 0.2067 0.2072 

Wl 36.0112 106.1355 458.3037 197.8123 205.7211 

W2 28.2698 56.9601 938.7897 363.4649 172.2003 

N 1000 
NCHA 5 
MODE 1 

OUTPUT 

STAT 2.3380 (p-value=0.6738) 
NDF 4 
IFAIL 

Test 3 

INPUT 

P 0.0296 0.1106 0.4460 0.2067 0.2072 

Wl 72.0225 212.2710 916.6075 395.6246 411.4423 

W2 113.0790 227.8403 3755.1587 1453.8595 688.8014 
N 1000 
NCHA 5 
MODE 2 

OUTPUT 
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STAT 2.2398 (p-value=0.5241) 
NDF 3 
IFAIL 
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